Goto

Collaborating Authors

 cold posterior effect




Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect

Neural Information Processing Systems

The "cold posterior effect" (CPE) in Bayesian deep learning describes the disturbing observation that the predictive performance of Bayesian neural networks can be significantly improved if the Bayes posterior is artificially sharpened using a temperature parameter T <1. The CPE is problematic in theory and practice and since the effect was identified many researchers have proposed hypotheses to explain the phenomenon. However, despite this intensive research effort the effect remains poorly understood. In this work we provide novel and nuanced evidence relevant to existing explanations for the cold posterior effect, disentangling three hypotheses: 1. The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength.


A Further Related Work

Neural Information Processing Systems

Motivated by the behavior of Bayesian inference in misspecified models Grün-wald et al. ( 2017); Jansen ( 2013) extensively studied the so called "generalized" Bayesian inference, However, these works consider only "warm posteriors" Grünwald et al. ( 2017) the prior favours simple models, hence it is beneficial to put more weight onto the prior and use warm posterior. Finally, we mention the work of Bhattacharya et al. ( 2019), in which the authors develop fractional posteriors with the goal of CIFAR-10, have been collected and curated. The Street View House Numbers dataset ( Netzer et al., 2011), which is divided into a training In CIFAR-10 ( Krizhevsky and Hinton, 2009), labellers followed strict guidelines to ensure high quality labelling of the images. In particular, labellers were instructed that "it's worse to include one that shouldn't be included than to exclude one. In this section we review the basics of (SG)-MCMC inference.



Priors Matter: Addressing Misspecification in Bayesian Deep Q-Learning

van der Vaart, Pascal R., Yorke-Smith, Neil, Spaan, Matthijs T. J.

arXiv.org Artificial Intelligence

Uncertainty quantification in reinforcement learning can greatly improve exploration and robustness. Approximate Bayesian approaches have recently been popularized to quantify uncertainty in model-free algorithms. However, so far the focus has been on improving the accuracy of the posterior approximation, instead of studying the accuracy of the prior and likelihood assumptions underlying the posterior. In this work, we demonstrate that there is a cold posterior effect in Bayesian deep Q-learning, where contrary to theory, performance increases when reducing the temperature of the posterior. To identify and overcome likely causes, we challenge common assumptions made on the likelihood and priors in Bayesian model-free algorithms. We empirically study prior distributions and show through statistical tests that the common Gaussian likelihood assumption is frequently violated. We argue that developing more suitable likelihoods and priors should be a key focus in future Bayesian reinforcement learning research and we offer simple, implementable solutions for better priors in deep Q-learning that lead to more performant Bayesian algorithms.


Appendix for On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

Neural Information Processing Systems

Overall, properly representing aleatoric uncertainty is a challenging but fundamentally important consideration in Bayesian classification. We have shown that posterior tempering provides a mechanism to more honestly represent our beliefs about aleatoric uncertainty, especially in the presence of data augmentation. In general, as in Wilson and Izmailov [ 62 ], we should not be alarmed if T =1 is not optimal in sophisticated models on complex real-world datasets. Moreover, we have shown how other mechanisms to represent aleatoric uncertainty, such as the noisy Dirichlet model, 17 do not suffer from a cold posterior effect in the presence of data augmentation. Indeed, while an interesting phenomenon, cold posteriors should not be conflated with the success or failure of Bayesian deep learning.



Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect

Neural Information Processing Systems

The "cold posterior effect" (CPE) in Bayesian deep learning describes the disturbing observation that the predictive performance of Bayesian neural networks can be significantly improved if the Bayes posterior is artificially sharpened using a temperature parameter T 1. The CPE is problematic in theory and practice and since the effect was identified many researchers have proposed hypotheses to explain the phenomenon. However, despite this intensive research effort the effect remains poorly understood. In this work we provide novel and nuanced evidence relevant to existing explanations for the cold posterior effect, disentangling three hypotheses: 1. The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength. Our results demonstrate how the CPE can arise in isolation from synthetic curation, data augmentation, and bad priors.


Cold Posteriors through PAC-Bayes

Pitas, Konstantinos, Arbel, Julyan

arXiv.org Machine Learning

We investigate the cold posterior effect through the lens of PAC-Bayes generalization bounds. We argue that in the non-asymptotic setting, when the number of training samples is (relatively) small, discussions of the cold posterior effect should take into account that approximate Bayesian inference does not readily provide guarantees of performance on out-of-sample data. Instead, out-of-sample error is better described through a generalization bound. In this context, we explore the connections between the ELBO objective from variational inference and the PAC-Bayes objectives. We note that, while the ELBO and PAC-Bayes objectives are similar, the latter objectives naturally contain a temperature parameter $\lambda$ which is not restricted to be $\lambda=1$. For both regression and classification tasks, in the case of isotropic Laplace approximations to the posterior, we show how this PAC-Bayesian interpretation of the temperature parameter captures the cold posterior effect.